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Abstract 

We show that disjointness requires randomized communication Q (^ "'^^^^k^^ ^ in the general 
/j-party number-on-the-forehead model of complexity. The previous best lower bound for k > 
3 was . Our results give a separation between nondeterministic and randomized multiparty 
number-on-the-forehead communication complexity for up to /c = log log n — 0(log log log n) 
many players. Also by a reduction of Beame, Pitassi, and Segerlind, these results imply subex- 
ponential lower bounds on the size of proofs needed to refute certain unsatisfiable CNFs in a 
broad class of proof systems, including tree-like Lovasz-Schrijver proofs. 

1 Introduction 

Since its introduction thirty years ago [Abe78, Yao79], communication complexity has become a 
key concept in complexity theory and theoretical computer science in general. Part of its appeal is 
that it has applications to many different computational models, for example to formula size and 
circuit depth, proof complexity, branching programs, VLSI design, and time-space trade-offs for 
Turing machines (see [KN97] for more details). 

One area of communication complexity which still holds many mysteries is the fc-party "number- 
on-the-forehead" model, originally introduced by Chandra, Furst, and Lipton [CFL83]. In this 
model, k parties wish to compute a function / : ({ — 1, +1}")'' { — 1, !}• On input (xi, . . . , Xk), 
the z*^ player receives (xi, . . . , Xj+i, . . . , Xk). That is, player i has knowledge of the entire 
input except for the string Xj, which figuratively can be thought of as sitting on his forehead. The 
players communicate by writing messages "on a blackboard," so that all players see each mes- 
sage. The large overlap in the player's knowledge is part of what makes showing lower bounds 
in this model so difficult. This difficulty, however, is rewarded by the richness and strength of 
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consequences of such lower bounds: for example, by results of [HG91, BT94], showing a super- 

polylogarithmic lower bound on an explicit function for polylogarithmic many players would 
give an explicit function outside of the class ACC° — that is, a function which requires super- 
polynomial size constant-depth circuits using AND, OR, NOT, and modulo m gates. 

While showing such bounds remains a challenging open problem, we do know of explicit func- 
tions which require large communication in this model for ©(log n) many players. Babai, Nisan, 
and Szegedy [ENS 89] showed that the inner product function generalized to fc-parties requires ran- 
domized communication r2(n/4'^), and for other explicit functions slightly larger bounds of size 
il{n/2'^) are known [FG05]. These lower bounds are all achieved using the discrepancy method, a 
very general technique which gives lower bounds even on randomized models with error probabil- 
ity close to 1/2, and also on nondeterministic communication complexity. 

For some basic functions, however, there is a huge gap in our knowledge. One example is 
the disjointness function, or equivalently its complement, set intersection. In the set intersection 
problem, the goal of the players is to determine if there is an index j such that every string Xi 
has a —1 in position j, where here and throughout the paper we interpret —1 as 'true.' The best 
known protocol has cost 0{k'^n log(n) /2'^) [Gro94]. On the other hand, the best lower bound in 
the general number-on-the-forehead model is ^^j, for A; > 3 [Tes02, BPSW06]. For k = 2 tight 
bounds are known of 0(n) for randomized communication complexity [KS87] and Q{^/rl) for 
quantum communication complexity [Raz03, AA05]. 

A major obstacle toward proving better lower bounds on set intersection is that it has a low cost 
nondeterministic protocol. In case there is a position where all players have a —1, with O(logn) 
bits a prover can send the name of this position and the players can then verify this is the case. 
Since the discrepancy method is also a lower bound on nondeterministic complexity, it is limited 
to logarithmic lower bounds for set intersection. Even in the two-party case, determining the 
complexity of set intersection in the randomized and quantum models was a long-standing open 
problem, in part for this reason. 

In the multiparty case, the discrepancy method is the only technique which has been used 
to show lower bounds on the general randomized model of number-on-the-forehead complexity. 
Although other two-party methods can be generalized to the multiparty number-on-the-forehead 
model, they can become very difficult to handle. One source of this difficulty is that, whereas in 
the two party case we can nicely represent the function f(x,y) as a matrix, in the multiparty case 
we deal with higher dimensional tensors. This makes many of the linear algebraic tools so useful 
in the two-party case inapplicable or at least much more involved. For example, while matrix rank 
is a staple lower bound technique for deterministic two-party complexity, in the tensor case even 
basic questions like the maximum rank of an x n x n tensor remain open. 

Besides this technical challenge, additional motivation to studying the number-on-the-forehead 
complexity of disjointness was given by Beame, Pitassi, and Segerlind [BPS06], who showed 
that lower bounds on disjointness imply lower bounds on a very general class of proof systems, 
including cutting planes and Lovasz-Schrijver proof systems. 

We show that disjointness requires randomized communication Q (^^^^-^^ j in the general k- 

party number-on-the-forehead model. This separates nondeterministic and randomized multiparty 
number-on-the-forehead complexity for uptok — log log n — 0(log log log n) many players. Also 
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by the work of [BPS06] this implies subexponential lower bounds on the size of proofs needed 
to refute certain unsatisfiable formulas by tree-like proofs in Lovasz-Schrijver and more powerful 

proof systems. 

Chattopadhyay and Ada [CA08] have independently obtained similar bounds on disjointness 
using similar techniques. 

1.1 Related work 

For restricted models of computation, bounds are known which are stronger than ours. Wigder- 
son showed that for one-way three-party number-on-the-forehead protocols, disjointness requires 
communication (this result appears in [BHKOl]). More recently, Viola and Wigderson 

[VW07] extended this approach to show a bound of ^l{n^/^^^^yk^''^^) on the complexity of one- 
way A;-party protocols computing disjointness. These results actually show bounds on a pointer 
jumping function which reduces to disjointness. 

Beame, Pitassi, Segerlind, and Wigderson [BPSW06] devised a method based on a direct prod- 
uct theorem to show a bound on the complexity of three-party disjointness in a model 
stronger than one-way where the first player speaks once, and then the two remaining players 
interact arbitrarily. 

Following up on our work, David, Pitassi, and Viola [DPV08] gave an explicit function which 
separates nondeterministic and randomized number-on-the-forehead communication complexity 
for up to fi(logn) players. They are also able, for any constant c to give a function computable 
in AC° which separates them for up to c log log n players. Note that disjointness can be computed 
in AC*^, but that our bounds are already trivial for log log n players. Even more recently, Beame 
and Huynh-Ngoc [BHN08] have shown a bound of 2^(v^^°sn/v^)-fe on the /c -party number-on-the- 
forehead complexity of disjointness. This bound remains non-trivial for up to G(log^/^ n) many 
players, but is not as strong as our bound for few players. 

1.2 Overview of techniques 

There is a natural correspondence between functions / : ({ — 1, +1}")^ — * { — 1, 1} and sign k- 
tensors. Sometimes it is more convenient to consider the function form, and sometimes, like when 
discussing norms, it is more convenient to consider tensors. 

Our proof combines two ingredients. The first of these is the notion of an approximation norm. 
For a norm $, and a sign tensor A, the approximation norm associated to $ and A, denoted 
is the smallest $ norm of an element 'close' to A. Here a quantifies the term 'close.' 

Approximation norms turn out to be quite useful for showing lower bounds on randomized 
and quantum communication complexity [KlaOl, Raz03, LS07]. Razborov, for example, uses the 
approximation trace norm to prove a tight lower bound on the quantum communication complexity 
of set intersection. 

We use what we call the cylinder intersection norm, denoted ji. This norm can be seen as 
a multiparty generalization of a quantity used in Lemma 3.1 of Klauck [KlaOl]. As a correct 
deterministic protocol partitions the communication matrix into rectangles on which the func- 
tion is constant, analogously a correct deterministic number-on-the-forehead protocol decomposes 
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the communication tensor into cylinder intersections on which the function is constant. Roughly 

speaking, fi{A) measures how efficiently A can be written as a sum of cylinder intersections. In 
this way, if A has low communication complexity, it will also have low /j, norm. We defer formal 
definitions to Section 3. 

We denote the approximate version of the cylinder intersection norm by where 1 < a < oo 
represents the measure of approximation. This measure provides a lower bound on randomized 
communication complexity in the number-on-the-forehead model. The limiting case /U°°(yl) turns 
out to be exactly the usual discrepancy method. For bounded a we obtain a technique which is 
strictly stronger than the discrepancy method. 

Following [LMSS07, LS07], to show lower bounds on fx"' {A), we write it in terms of the dual 
norm //*. By definition of a dual norm, we have 

This "max" formulation of fi is often more convenient for showing lower bounds. The dual norm 
IJ,* is closely related to discrepancy with respect to the uniform distribution, so we can use existing 
techniques to upper bound n*{Q). 

This formulation of fj, also gives a way to write in terms of a maximization quantity. 

^ ^^(l + a)l(^Q)l + (l-a)IMK 

^ ^ ^ Q 2/i*(Q) 

All one needs for showing lower bounds is that the left hand side is at least as large as the right 
hand side. This can be shown quite simply using Equation 1 and elementary inequalities and was 
noted, for example, by Razborov in the context of the approximation trace norm. The fact that 
equality holds here requires the use of linear programming duality or a separation theorem for 
convex bodies and seems to be less well known. 

As the dual norm yU* is essentially discrepancy with respect to the uniform distribution, the 
approximation /i, norm can be seen as an extension of discrepancy in another way. Instead of 
proving that the tensor of interest A has small discrepancy, it is enough to prove that there is a 
tensor Q which has small discrepancy and has large correlation with A, relative to \\Q\\i. This is 
why this method is called generalized discrepancy in [CA08]. 

To find a good witness tensor Q, we use ideas from a second line of research. While the 
norm framework of Equation (2) provides a nice approach to lower bound communication com- 
plexity, it gives no hint about how to choose a good witness Q — in general a difficult problem. 
Works by Sherstov [SheOV, She08] and Shi and Zhu [SZ07] in the two-party case, and Chat- 
topadhyay [ChaOV] in the multiparty case provide an elegant way to choose a good witness for a 
general class of matrices and tensors. These works look at block composed functions of the form 
/ o g"'{xi, . . . , Xk) = f{g{x\, . . . , xl), . . . , g{xi, . . . , xD). Notice that set intersection is a block 
composed function where / = 0R„ is the OR function on n bits and g — AND^ is the /c -player 
AND function on one bit. Sherstov [SheOV] first showed that when g{x, i) — Xi, the discrepancy 
of a block composed function could be bounded in terms of the threshold degree of /, the mini- 
mum degree of a polynomial which agrees in sign with / on the Boolean cube. Building on this 
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result, Chattopadhyay showed an analogous statement in the number-on-the-forehead case for an 
appropriately generalized multiparty function g. 

Sherstov and independently Shi-Zhu showed that the approximate trace norm of a block com- 
posed function could be lower bounded in terms of the approximate degree of /, again provided 
that the inner function g satisfies certain technical conditions. The /i norm provides bounds at least 
as large as the trace norm method [LS07], thus these works also lower bound In this paper, 
we take the natural step to show that /x" of a block composed multiparty function can be lower 
bounded in terms of the approximate degree of /, for a particular multiparty inner function g such 
that the composed function fog''' can be embedded in the set intersection problem. 

1.3 Consequences for Lovasz-Schrijver proof systems and beyond 

Beame, Pitassi, and Segerlind [BPS06] show that bounds on multiparty disjointness imply strong 
lower bounds on the size of refutations of certain unsatisfiable formulas, for a very general class of 
proof systems. We now introduce and motivate the study of these proof systems. Formal definitions 
and the implications of our results will be given in Section 6.2. 

The fact that linear and semidefinite programs can be solved with high precision in polynomial 
time is a remarkable algorithmic achievment. It is thus interesting to ask how these algorithms 
fare when pitted against NP-complete problems. For many NP-complete problems, there is a 
very natural approach to solving them via linear or semidefinite programming: namely, we first 
formulate the problem as optimizing a convex function over the Boolean cube, i.e. with variables 
subject to the quadratic constraints = Xi. We then relax these quadratic constraints to linear or 
semidefinite constraints to obtain a program which can be solved in polynomial time. For example, 
a linear relaxation of xj = Xi may simply be the constraint < a;j < 1. In the case of vertex cover, 
for example, such a simple relaxation already gives a linear program with approximation ratio 
of 2. Semidefinite constraints are in general more complicated, but there are several "automatic" 
ways of generating valid semidefinite inequalities — that is, semidefinite inequalities satisfied by all 
Boolean solutions of the original problem. Perhaps the best known of these is the Lovasz-Schrijver 
"lift and project" method [LS91]. The seminal 0.878-approximation algorithm for MAXCUT of 
Goemans and Williamson [GW95] can be obtained by relaxing the natural Boolean programming 
problem with semidefinite constraints obtained by one application of the Lovasz-Schrijver method. 

As these techniques have given impressive results in approximation algorithms, it is natural to 
ask if they can also be used to efficiently obtain exact solutions. Namely, how many inequalities 
need to be added in general until all fractional optima are eliminated and only true Boolean optima 
remain? 

One way to address this question is to consider proof systems with derivation rules based on 
linear programming or the Lovasz-Schrijver method. Our particular application will look at the 
size of proofs needed to refute unsatisfiable formulas. Given a CNF cf), we can naturally represent 
the satisfiability of as the satisfiability of a system of linear inequalities, one for each clause. For 
example, the clause a; i V 2:4 V -ixs would be represented as xi + X4 + {1 — x^) > 1. Suppose 
that (f) is unsatisfiable. Then consider a proof system in which the "axioms" are the inequalities 
obtained from the clauses of 0, and the goal is to derive the contradiction > L By the results 
of [BPS06], our results on disjointness imply that there are unsatisfiable formulas such that any 
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refutation obtained by generating new inequalities by the Lovasz-Schrijver method in a "tree-Uke" 

way requires size 2""'^^ . For a standard formulation of the Lovasz-Schrijver method known as 
LS+, bounds of size 2^^'^) for tree-like proofs have already been shown by very different methods 
[IK06]. 

The advantage of the number-on-the-forehead communication complexity approach, however, 
is that it can also be applied to much more powerful proof systems which are currently untouchable 

by other methods. Beame, Pitassi, and Segerlind [BPS06] show that lower bounds on A; -party com- 
munication complexity of disjointness give lower bounds on the size of tree-like proofs of certain 
unsatisfiable CNFs (p{x), where the derivation rule is as follows: from inequalities /, g of degree 
/c — 1 in x, we are allowed to conclude a degree k — 1 inequality h if every Boolean assignment 
to X which satisfies / and g also satisfies h. Lovasz-Schrijver proof systems are a special case of 
such degree-2 systems. Our bounds on disjointness imply the existence of unsatisfiable formulas 
whose refutation requires subexponential size tree-like degree-A; proofs, for any constant k. ' The 
aforementioned lower bounds on LS+ proof systems strongly rely on specific properties of the 
Lovasz-Schrijver operator — showing superpolynomial bounds on the size of tree-like proofs in the 
more general degree-A; model was previously open even in the case k — 2. 



2 Preliminaries and notation 

We let [n] = {1, . . . , n}. For multiparty communication complexity it is convenient to work with 
tensors, the generalization of matrices to higher dimensions. If an element of a tensor A is speci- 
fied by k indices, we say that A is a /c-tensor. For a /c-tensor A of dimensions (ni, . . . , n^) we write 
size{A) — ni - ■ - Uk. A tensor for which all entries are in { — 1, 1} we call a sign tensor. For a func- 
tion / : XiX . . .X Xk {—1,1}, we define the communication tensor corresponding to / to be a 
fc-tensor A^ where Af[xi^ . . . ,Xk\ = f{xi, . . . , Xk). We identify / with its communication tensor. 
For a set Z C Xi X . . . X Xfc we let x{Z) be its characteristic tensor where x(Z)[xi, . . . ,Xk] = I 
if (xi, . . . , Xk) e Z and is otherwise. 

For a sign tensor A, we denote by D^{A) the deterministic communication complexity of 
A in the /c-party number-on-the-forehead model. The public coin randomized communication 
complexity with error bound e > is denoted R'^{A). We drop the superscript when the number 
of players is clear from context. 

We use the shorthand A> cto indicate that all of the entries of A are at least c. The Hadamard 
or entry wise product of two tensors A and B is denoted hy Ao B. Their inner product is denoted 
{A, B) = Ylxi,...,xk ^[^1' ■ ■ • ' ^k\B[xi, Xk\. The ii and i^o norms of a tensor A are ||^||i = 
Ea;i,...,xfe ■ ■ ,Xk] \ and \\A\\oo = max^^,...,^.^ | A [a;i, respectively. 

We also need some basic elements of Fourier analysis. For 5* C [n] we define X5 : {^l^+l}" — 
{ — 1, l}asxs{x) = Yii&s^i- As the form an orthogonal basis, for any function / : { — - 

'The conference version of this paper reported bounds on degree-fc proof systems for up to fc = log log n — 
O(logloglogn). As pointed out to us by Paul Beame, however, this is not justified by the reduction of [BPS06], 
which requires certain constraints on the size of k. 
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R we have a unique representation 

f{x) = J2hS)xs{x) 

SCn 

where f{S) = (1/2") (/, xs), are the Fourier coefficients of /. The degree of / is the size of the 
largest set S for which f{S) is nonzero. 



3 The Method 

In this section we present a method for proving lower bounds on randomized communication 
complexity in the number-on-the-forehead model that generalizes and significantly strengthens the 
discrepancy method. 



3.1 Cylinder intersection norm 

In two-party communication complexity, a key role is played by combinatorial rectangles — subsets 
of the form Zi x Z2 where Zi is a subset of inputs to Alice and Z2 is a subset of inputs to Bob. The 
analogous concept in the number-on-the-forehead model of multiparty communication complexity 
is that of a cylinder intersection. 

Definition 1 (Cylinder intersection) A subset Z^ C Xi x . . . x is called a cylinder in the 
i*'* dimension if membership in Zi does not depend on the i*^ coordinate. That is, for every 
{zi, . . . , Zi, . . . , Zk) G Zi and z[ G Xi it also holds that {zi, . ... z'-, .... Zk) G Zi. A set Z is 
called a cylinder intersection if it can be expressed as Z = nf^^^Zj where each Z^ is a cylinder in 
the i*^ dimension. 

Cylinder intersections are important because a correct deterministic number-on-the-forehead 
protocol for a function / partitions the corresponding communication tensor into cylinder inter- 
sections, each of which is monochromatic with respect to the function /. 

Fact 2 Let A be a sign k-tensor, and suppose that D'^IA) < c. Then there are cylinder intersec- 
tions Zi, . . . , Zic such that 

2= 

i=l 

where ai G {—1, +!}• 

Our main object of study, termed the cylinder intersection norm, relaxes this notion of de- 
composition to allow ctj G R. A similar such relaxation is done by [KKN95] in the context of 
nondeterministic communication complexity. 
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Cylinder intersection norm We denote by ji the norm induced by the absolute convex hull of 
the characteristic functions of all cylinder intersections. That is, for a /c -tensor B 

li{B) = mill 1^ \ai\ -B^Yl ^iX{Zi), ai e m| 

where each Zi is a cylinder intersection and is its characteristic tensor. 

In the two dimensional case, n is very closely related to the 72 norm [LMSS07, LS07]. Indeed, 
for matrices B we have a*(-B) — 6(72(-B)). 

Remark 3 In our definition of fi above we chose to take x{Zi) cis {0, 1} tensors. One can alter- 
natively take them to be ±1 valued tensors — a form which is sometimes easier to bound — without 
changing much. One can show 

>/i±i(5) >2-V(5). 

where B is a k-tensor and fi±i{B) is defined as above with x{Zi) taking values from {—1, 1}. In 
the matrix case, ii± is also known as the nuclear norm [Jam87]. 

By Fact 2 we have the following. 

Theorem 4 It holds that D^{A) > \og{iJ,{A)) for every sign k-tensor A. 

A public coin randomized protocol is simply a probability distribution over deterministic pro- 
tocols. This gives us the following fact: 

Fact 5 A sign k-tensor A satisfies R^{A) < c if and only if there are sign k-tensors A'^for i = 
!,...,£ satisfying D''{A'-) < c and a probability distribution {pi, . . . , p() such that 

I 

To lower bound randomized communication complexity we consider an approximate variant of the 
cylinder intersection norm. 

Definition 6 (Approximate cylinder intersection norm) Let Abe a sign k-tensor, and a > 1. 
Wfe define the a-approximate cylinder intersection norm as 

//"(A) = min{/i(S) : 1 < A o S < a} 

In words, we take the minimum of the cylinder intersection norm over all tensors B which are 
signed as A and have entries with magnitude between 1 and a. Considering the limiting case as 
q; — > 00 motivates the definition 

ir{A) = mm{ij{B) : 1 < A o 5} 
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One should note that //"(A) < //^(A) for 1 < /3 < a. 

The following theorem is an immediate consequence of the definition of the approximate cylin- 
der intersection norm and Fact 5. 

Theorem 7 Let Abe a sign k-tensor, and < e < 1/2. Then 

i?,^(A)>log(/."(/l))-logK) 
where cte = 1/(1 — 2e) and a > a^. 
Proof: Let pi and A ■ for 1 < i < £ be as in Fact 5. We take 



1 ^ 

B^——YPiK- 
l-2e^^ ' 

1=1 



Notice that 1 < B o A < a^, and hence by Definition 6 

li^^{A)<yi{B). 
Employing the fact that is a norm and Theorem 4, we get 



2RJ{A) 



l-2e' 



□ 



The nondeterministic complexity of a sign /c-tensor A, denoted N [A), is the logarithm of the 
minimum cardinality of a set of cylinder intersections [Zi] such that every entry of A with value 
— 1 is covered by some Z^, and no entry of A with value 1 is covered by Z^. Notice that if {Zj} is 
such a covering of A, then letting S = — XI we have 1 < Ao (2B + J) < oo where J is 
the all one tensor. As J is itself a cylinder, we have /x( J) = 1, which gives the following. 

Theorem 8 (folklore) For a sign k-tensor A, 

N\A)> \og^!-^^ 

As we shall see in Section 3.3, is exactly the discrepancy method, which explains why the 
discrepancy method cannot show good lower bounds on disjointness, or indeed any function with 
low nondeterministic or co-nondeterministic communication complexity. 
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3.2 Employing duality 

We now have a quantity, which can be used to prove lower bounds on randomized commu- 

nication complexity in the number-on-the-forehead model. As this quantity is defined in terms of 
a minimization, however, it seems in itself a difficult quantity to bound from below. 

In this section, we employ the duality theory of linear programming to find an equivalent for- 
mulation of fi"{A) in terms of a maximization problem. This makes the task of proving lower 
bounds for much easier, as the V quantifier we had to deal with before is now replaced by 

an 3 quantifier. 

As it turns out, in order to prove lower bounds on ii°'{A) we will need to understand the dual 
norm of /i, denoted /x*. The standard definition of a dual norm is 

At*(Q)= max \{B,Q)\, 

B:ti(B)<l 

for any tensor Q. Since the unit ball of /i is the absolute convex hull of the characteristic vectors 
of cylinder intersections, we can alternatively write 

A.*(g)=max|(g,x(^))| 

where the maximum is taken over all cylinder intersections Z. 

It is instructive to compare this with the definition of discrepancy. 

Definition 9 (discrepancy) Let Abe a sign k-tensor, and let P be a probability distribution on its 
entries. The discrepancy of A with respect to P, written discp{A), is 

discp(A) =max\{Ao P,x{Z))\ 
z 

where the maximum is taken over cylinder intersections Z. 

Thus we see that discp(y4) = n*{Ao P), and we can use existing techniques for discrepancy to 
also upper bound //*. 

As the dual of a dual norm is again the norm, we can write the // norm as 

li(B) =max^^^. (3) 

To prove our lower bounds, we will use an equivalent formulation of in terms of the dual norm 

Theorem 10 Let Abe a sign tensor and 1 < a < oo. 

When a = oowe have 

li'^{A)= max 

Q:AoQ>0 IjL*{Q) 
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Proof: We can quite easily see that the left hand side is at least as large as the right hand side, 
which is all that is needed for proving lower bounds. By Equation (3) and the definition of we 
have 

Li (A) = mm max — --— . 

B:l<AoB<a Q ^Jb*{Q) 

If we rewrite Q as the sum of two parts, Q^, satisfying Q"*" o A > and Q' satisfying Q~ o A < 
then we can see that 

;.<"(X)> max +"<'^-«"> 



>■•(«+ + «-) 

It is now straightforward to verify that this expression can be reworked into the form given above 
in the two cases 1 < a < oo and a = oo. 

To see that this inequality holds with equality, we write as a linear program and then use 
duality to derive the dual expression given in the theorem. As it is easy to check that the primal pro- 
gram is feasible with a finite optimum, by Slater's condition these primal and dual forms coincide 
with the same finite value. 

We treat the case 1 < a < oo first. We can write as a linear program as follows. For 

each cylinder intersection Zi \Q\.Xi — x{Zi). Then 




s.t. 1 < - I o ^ < a 

Pi, qi>o 

Taking the dual of this program in the straightforward way, we obtain 

^ ^(l + ..)(A.Q) + (l-a)||Q||. 

s.t. \{Xi,Q)\ <1, for all Xi 

For a = oo we get the same program as above without the constraint — qi)Xi) oA<a. 

Dualizing this program gives the desired result. □ 

Let us take a moment to compare our approach with that of Chattopadhyay and Ada. They also 
use the approximation n norm, but with an additive approximation factor rather than a multiplica- 
tive factor as we use. More precisely, they use the measure IJ>^{A) = imnB:\\A-B\\oo<€ l^i^)- The 
dual form of this measure has the form 

{A,Q)-e\\Q\\, 
u [A) = max -— . 

Chattopadhyay and Ada directly derive that this dual expression is a lower bound on multiparty 
distributional communication complexity. Yao's characterization of randomized complexity in 
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terms of distributional complexity [Yao83] then gives that it is also a lower bound on randomized 

communication complexity. They do not mention the primal definition of yu", but other than that, 
their proof is similar in structure to ours. For our proof we do not use Yao's principle but apply 
duality directly on the measure n rather than on the complexity class itself. 

While our presentation through the primal version of the norm is perhaps not as familiar 
as that via distributional complexity, we feel it does have advantages. First of all, this discussion 
holds quite generally: for any norm $ one can show using the separation theorem that the approx- 
imation version has a dual characterization analogous to that in Theorem 10. Second, we feel 
that the primal definition of /i" arises very naturally and gives insight into the origin of the dual 
formulation — we do not have to guess this formula but can derive it. Finally, it is interesting to 
note that the primal and dual formulations are equivalent. This means that we do not lose anything 
in considering the more convenient dual formulation for proving lower bounds. 



3.3 The discrepancy method 

Virtually all lower bounds in the general number-on-the-forehead model have used the discrepancy 
method. Let A be a sign tensor, and recall the definition of discp(A) from Section 3.2. Let 
disc (A) = minp discp(A), where the minimum is taken over all probability distributions P. The 
discrepancy method turns out to be equivalent to iJ,°°{A). 

Theorem 11 



disc(A) 

Proof: By Theorem 10, for every sign tensor A 



QoA>0 



We can rewrite this as 



oo(A, (AQ) {AAoP) 

u (A) = max = max — — — -f 

QoA>0 IJ,*{Q) P:P>0 11* {A O P) 

As both numerator and denominator are homogeneous, we have 

oo/.x {A,AoP) 1 

fj, [A) = max — — — — - = max 



P:P>0 n*(Ao P) P:P>0 n*(Ao P) 

\\p\\l=i Il-Plll=l 
1 

disc(^) ' 

□ 
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4 Techniques to bound 



In the last section, we saw that to bound the randomized number-on-the-forehead communication 
complexity of a sign tensor A, it suffices to find a tensor Q such that {A, Q) is large and ii*{Q) 
is small. The first quantity is relatively simple and is in general not too hard to compute. Upper 
bounding /x* (Q) is more subtle. In this section, we review some techniques for doing this. 

In upper bounding the magnitude of the largest eigenvalue of a matrix B, a common thing is 
to consider the matrix BB^ , and use the fact that ||-Bp < ||-B-B^||. We will try to do a similar 
thing in upper bounding /x*. In analogy with BB^ we make the next definition. Here and in what 
follows all expectations are taken with respect to the uniform distribution. 

Definition 12 (Contraction product) Let B be a k-tensor with entries indexed by elements from 
Xi X . . . X Xk. We define the contraction product ofB along Xi, denoted B •! B, tobe a2{k — 1)- 
tensor with entries indexed by elements from X2 x X2 x . . . x Xk x X^. The X2,X2, ■ ■ ■ ,Xk,x',. 
entry is defined to be 



B •! B[x2, 4, . . . , Xfe, x'f.] = 



n 



B[xi,y2,...,yk\ 



The contraction product may be defined along other dimensions mutatis mutandis. 

Notice that when _B is a m-by-n matrix B •i B corresponds to {l/m)BB^ . In analogy with 
the fact that < m\\B •i B\\, the next lemma gives a corresponding statement for the /x* 

norm and A; -tensors. This lemma originated in the work of Babai, Nisan, and Szegedy [BNS89] 
(see also [Chu90, RazOO]) and all lower bounds on the general model of randomized number-on- 
the-forehead complexity use some version of this lemma. The particular statement we use is from 
Chattopadhyay [Cha07]. 



Lemma 13 Let B be a k-tensor. Then 

l^*iB) 
size(-B) 



< 



i^*{B*iB) 
size{B •! B) 



< E[\B»iB\] 



Proof: The second inequality follows since fJ'*{X) < \\X\\i for any real tensor X. The first 
inequality is standard, and follows by applying the Cauchy-Schwarz inequality repeatedly k — 1 
times. □ 



4.1 Example: Hadamard tensors 

We give an example to show how Lemma 13 can be used in conjunction with our fi method. Let 
H hea N-hy-N Hadamard matrix. We show that 11°° (H) > y/N. Indeed, simply let the witness 
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matrix Qhe H itself. Incidentally, this corresponds to taking the uniform probability distribution 
in the discrepancy method. With this choice we clearly have H o Q >0, and so 

Now we bound IJ'*{H) using Lemma 13 which gives: 

pL*{Hy < ¥\\H •! H\] = 

?& H •iH has nonzero entries only on the diagonal, and these entries are of magnitude one. 

Ford and Gal [FG05] extend the notion of matrix orthogonality to tensors, defining what they 

call Hadamard tensors. 

Definition 14 (Hadamard tensor) Let H be a sign k-tensor. We say that H is a Hadamard tensor 
if 

(if •! H)[x-2,x'2, . . . = 

whenever Xi ^ x'^for alii — 2, ... ,k. 

The simple proof above for Hadamard matrices can be easily extended to Hadamard tensors: 
Theorem 15 (Ford and Gal [FG05]) Let H be a Hadamard k-tensor of side length N. Then 

( N 



k-l) 

Proof: We again take the witness Q to be itself. This clearly satisfies H o Q > 0, and so 

It now remains to upper bound /x* (H) which we do by Lemma 13. This gives us 

l^*iHf~' < N^'''~' ¥.[\H*iH\] 

The "Hadamard" property of H lets us easily upper bound E[|if •! Note that each entry of 
if •! if is of magnitude at most one, and the probability of a non-zero entry is at most 

Pr[vi,(a;, = a;9]<^ 
by a union bound. Hence, we obtain 

^*{Hf-'<{k-lf 



N 

Putting everything together, we have 

/.-(ii) > 



^ l/2'=-l 

k-l) 

□ 



Remark 16 By doing a more careful inductive analysis, Ford and Gal obtain this result without 
the k — 1 term in the denominator They also construct explicit examples of Hadamard tensors. 
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5 Lower bounds on fi^ for pattern tensors 

In Section 5.1 we describe a key lemma which relates the approximate polynomial degree of / to 
the existence of a hard input "distribution" for /. This will only truly correspond to a distribution 
in the case of discrepancy — otherwise it can take on negative values. This lemma was first used in 
the context of communication complexity by Sherstov [She08] and independently by Shi and Zhu 
[SZ07]. 

In Section 5.2 we use this distribution, together with the machinery developed in Section 4 
to prove lower bounds on a special kind of tensors, named pattern tensors. The application to 
disjointness appears in Section 6.1. 

5.1 Dual polynomials 

We define approximate degree in a slightly non-standard way to more smoothly handle both the 
bounded a and a — oo cases. 

Definition 17 Let f : { — 1, +1}" — > { — 1, 1}. For a > 1 we say that a function g gives an a- 
approximation tofifl < g{x)f{x) < a for all x e {—1, +1}". Similarly we say that g gives an 
oo -approximation tofifl < g{x)f{x) for all x e {—1, +1}'^. We let the a-approximate degree of 
f, denoted deg^(/), be the smallest degree of a function g which gives an a-approximation to f. 

Remark 18 In a more standard scenario, one is considering a 0/1 valued function f and defines 

the approximate degree as deg'^{f) = min{deg((?) : ||/ — g\\ao < e}. Letting f± be the sign 
representation of f, one can see that for < e < 1/2 our definition is equivalent to the standard 
one in the following sense: degg(/) = deg^^(/±) where = j^ff. 

For a fixed natural number d, let ad{f) be the smallest value of a for which there is a degree 
d polynomial which gives an «-approximation to /. Notice that aa^f) can be written as a linear 
program. Namely, let B{n, d) = J2i=o (") ' ^^'^ Whea 2"--hy-B{n, d) incidence matrix, with rows 
labelled by strings x E {— and columns labeled by monomials of degree at most d. We set 
W{x, m) = m{x), where m{x) is the evaluation of the monomial m on input x. Then 

aaU) = m.in{\\Wy\\^:l<Wyof} 
y 

If this program is infeasible with value a — that is, if there is no degree d polynomial which gives 
an a-approximation to / — then the feasibility of the dual of this program will give us a "witness" 
to this fact. We refer to this witness as a dual polynomial for /. It is this witness that we will use 
to construct a tensor Q which witnesses that jjf^ is large. 

Lemma 19 

aaif) = max (i±i^ : = 1, v^W = 1 
Proof: Follows from duality theory of linear programming. □ 
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Corollary 20 (Sherstov Corollary 3.3.1 [She08], Shi-Zhu Section 3.1 [SZ07]) Let f : {-1, +1} 
R and let d — deg^(/). Then there exists a function v : {—1, +1}" — > IR such that 

1. {v, xt) — whenever \T\ < d. 

2. \\v\\i = 1. 

When a — oo, there is a function v : { — 1,+1}" — > IR satisfying items (1), (2), and such that 
v{x)f{x) > for all X e {-1,+1}^ 

Spalek [Spa08] has given an explicit construction of an optimal dual polynomial for the OR 
function. For our analysis, however, we only make use of the properties guaranteed by Corol- 
lary 20. 

5.2 Pattern Tensors 

We define a natural generalization of the pattern matrices of Sherstov [SheOV] to the tensor case. 
We use a slightly different definition of pattern tensors than that of Chattopadhyay [ChaOV] to 
allow the reduction to disjointness. 

Let (f) : { — 1, +1}™ — s> M be a function and M a natural number. We define a /c-dimensional 
pattern tensor Ak,M,(t> as follows. Let x G { — 1, +1}™'^^'' \ We view x = {x^, . . . ,x"^) as con- 
sisting of m many blocks, where each Xi G {—1, +1}^^^ ^ can be viewed as a /c — 1 dimensional 
tensor of side length M. We further let yi G [M]"^ for each i — 1, . . . , /c — 1 and view each 
Vi — ■ ■ ■ ■> consisting of m-blocks where yi[j] G [M] is an index into a side of x^. 

Now define 

Ak,mA^^ 2/1, • • • , Vk-i] = . . . , 2/fc-i[l]], • • • , x"'[yi[m], . . . , yk-i[m]]). 

Note that size{Ak,M,<f>) = 2™^'"'m™(''-^\ We will often use the abbreviation y = {yi, yk-i). 
A nice property of pattern tensors is that every m-bit string z appears as input to an equal number 
of times, over all choices of a;, ^. 

The key lemma about pattern tensors is given next. Such a lemma was first shown by Chat- 
topadhyay [ChaOV]. Chattopadhyay and Ada [CA08] also show a statement similar to this one. 

Lemma 21 Let Abe a {k, M, c-4>) pattern tensor, where c = 2'"size(/l)~^. Suppose that 4> satisfies 
^i{<t>) — 1 and(l)T — for all sets T C [m] with \T\ < d. Then 

li*{A) < 2-^ 

provided that M > 2e{k - l)2^'°"'m/d 
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Proof: The idea of the proof will be to bound E[|74 •! 74|] and apply Lemma 13 to obtain an upper 
bound on fJ*(A). For a string i G {0, 1}^"^ we use the abbreviation — {y[^, . . . ,y^*L~/). In 
particular, f = {y^, J and f = {yl, yl_^). 



size (A) 



2fc-l_i 

^=0 rc[m] 



ieT 



< 



size (A) 



2fc-i ^y°,y 



E 



n 



\Te\>d 



n Ayim,---,yt-m 



^e{o,i}'=-i 



(4) 



(5) 



Here we have used the fact that 0(T) < 2-"£i((/)) = 2"". 

We now develop a sufficient condition in terms of |/°, and Tq, . . . , T2fc-i„i, for the product 
of expectations over to be zero. We say that |/°, y^ select a nondegenerate cube in position 
i if 7^ for all j = 1, . . . , A; — 1. The reason for this terminology is that in this case 

. . . , yl^_^i[i]) define 2^^"^ distinct points over i e {0, 1}'^"^. If this is not the case, we say 
that y^,y^ select a degenerate cube in position i. 

Notice that if select a nondegenerate cube in position i e [m] and i E for some 

i e {0, 1}*^-^ then 

E.. n Ayi'\^],---,yt-M-o. 

^e{o,i}'=-i 

We will now upper bound the probability over the choice of and Tq, ■ ■ ■ , ^2'=-i-i that this 
does not happen. Suppose that y^, y^ select g many degenerate cubes. By the above reasoning the 
number of sets Tq, . . . , T2k-i_i which lead to a nonzero expectation is at most 



E 

\r=d+l 



< 2^2 



Now we bound the probability that y", y select g many degenerate cubes. The probability that 

yj[i] = yj[i] is 1/M. Thus by a union bound, the probability that a single cube is degenerate is 
at most {k — l)/M. Finally, as each index is chosen independently, the probability of g many 
degenerate cubes is at most 



m 



9 



M 
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Putting everything together we have 



g=d+l 

1 

< 



< 



size(A)2'=-^ 

provided that M > 2e{k - l)2'^'"'m/d. □ 



Remark 22 Our analysis cannot be improved by much without using more explicit information 
about the Fourier coefficients q{T) than given in Corollary 20. Apart from removing the Fourier 
coefficients, the only inequality we have used to arrive at Equation (5) is to turn an absolute value 
of a sum into a sum of absolute values. When y^,y^ select a degenerate cube, the most likely case 
is that it is what we call 1-degenerate — that is yf[t] — yl[t\ for exactly one 1 < i < A; — 1. If the 
degenerate cubes selected by y^. are all 1-degenerate, then one can see that the only sets {T^} 
which lead to a nonzero expectation are ones where the sets come in pairs. The number of such 
paired sets {T^} is not significantly smaller than the upper bound we give; furthermore, in this 
case all Fourier coefficients will be taken to an even power and so no cancellation occurs and the 
absolute value of the sum will be equal to the sum of absolute values. 

With this lemma in hand, we can now show our main result, proving a lower bound on /x" {Ak,Mj) 
in terms of the approximate degree of /. 

Theorem 23 For a nonnegative integer m and a Boolean function f on m variables, and an 
integer k > 2 

log/^"(^.,M,/) > deg,„(/)/2^-^ + log 

ao + l 

for every 1 < a < ccq < oo, provided M > 2e{k — 1)2^'' ^m/ deg„^{f). 
Furthermore, 

log//~(A,M,/)>deg^(/)/2'=-\ 
provided M > 2e(k - l)2'^'"'m/ deg^(f) 

Proof: For simplicity we will drop the subscripts and just write A for A^^mj- Recall that 

[l + a){A,Q) + {l-a) 



u°'(A) = max 

Q:||QI|l= 

p°°{A) = max 



1 2p*{Q) 

{A,Q) 



Q:QoA>0 p*{Q) 
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Let q be the vector from Corollary 20 which witnesses that the cto -approximate degree of / 
is at least d. We let Q be the (k, M,c ■ q) pattern tensor where c = 2'^/size{A). This choice of 
normalization implies that ||Q||i = 1 as ||g||i = 1. 

First consider the case 1 < a < oo. Then we have {q, f) > (ao — l)/(ao + 1), and so 
(^4, Q) > {ao — 1)/ (tto + 1). This allows us to bound (1/2) the term in the numerator of as 
follows: 

{l + a){A,Q) + (1 - g) ^ ap - a 
2 - ao + 1' 

In the case a = oo, observe that Q inherits the property (5o^>0asgo/>0. The fact that 
? ° / > together with H^Hi = 1 gives (/, q) — 1, which in turn implies {A, Q) — 1. 

Let d = deg^j^(/) or d = dcg^{f), respectively. As q has no nonzero Fourier coefficients of 
degree less than d by Corollary 20, we can apply Lemma 21 to give 

under the assumption that M > 2e{k — 1)2^'' ^m/d. The statement now follows from Lemma 13. 

□ 



6 Applications 

In this section, we apply Theorem 23 to prove lower bounds on the A; -party number-on-the-forehead 
randomized communication complexity of disjointness. Then we formally state the implications 
this result has for proof systems via the results of Beame, Pitassi, and Segerlind [BPS06]. 

6.1 A lower bound for disjointness 

LetOR„ : {-1, +1} be the OR function on n bits, and let DISJjfc,„ : - 

{ — 1, +1} be defined as DISJfe,„(a;i, . . . , Xk) = — OR„(a;i A X2 . . . A Xk). 

By embedding a pattern tensor into the tensor DISJfc,„, we can get the following lower bound. 

Corollary 24 

-Ri/4(DISJjk,„) = Q 

Proof: The idea of the proof will be to embed an appropriate pattern tensor into DIS Jfe,n and apply 
Theorem 23. Let Ck = 5e{k — 1)2^^ \ As Nisan and Szegedy have shown deg3(0Rn) > \/n/6, 
we wish to define integers m, M such that M > Ck\fm and mM^~^ < n. To this end, let 
m — L (2cJ)'^-i J M — Ck \\/rn ] . Let n' — mM^~^. One can easily check that n' < n. 

We will now see that the pattern tensor (/c, M, ORm) is a subtensor of OR„/(xi A ... A Xk). 
This will then give the result by the obvious reduction to DIS Jfe,„. 
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Let A be the {k, M, OR^) pattern tensor. Recall that 

A[x,yi, . . .,yk-i] = OR^(a;^(yi[l], . . .,yk-i[l]), . . . , a;"[|/i[m], . . . ,7/fc-iN]), 

where each yj[i] e [M], and x-^ is a /c — 1 dimensional tensor of side length M. To each yj[i] we 
associate a k — 1 tensor Zj of side length M, where Zj[ti, .... tk-i] = 1 if and only if tj = yj[i]. 
In this way, . . . , ?/a;-i[1]] = ORmi<:-i{x^ A 4 A ... A Letting Zj = {z^, . . . , 2;™) we 

have 

OR„/(xiA^i . . .AZk-i) = ORm{ORMk-i{xlAzlA. . .A^-i), ■ ■ ■ , ORMk-iix"^ Az'l' A. . .A^^_i)). 

This shows that A is a subtensor of — DISJjk,„/. The result now follows from Theorem 23 and 
Theorem 7. □ 



Remark 25 Note that a statement similar to that of Corollary 24 can be proved for any symmetric 
function, not just OR But for some functions (e.g. threshold functions with threshold a constant 
fraction ofn) much better bounds can be proved by reduction to inner product. For this reason, we 
do not include the general statement here. 

6.2 Proof systems 

In this section we formally define the proof systems discussed in the introduction, and the lower 
bounds which follow from our results on disjointness. 

A fc-threshold formula is a formula of the form J2j Ij^j ^ where t, 7^ are integers, and each 
rrij is a monomial over variables ,xi, . . . , ,t„. The size of a A;-threshold formula is the sum of the 
sizes of 7j and t, written in binary. For fc-threshold formulas /i, /2, g, we say that g is semantically 
entailed by /i and /2 if every 0/1 assignment to xi , . . . , a;„ that satisfies both /i and /2 also satisfies 
9- 

Let (f) be an unsatisfiable CNF formula with variables xi, . . . , Xn- For each clause of </> we 
create a linear threshold formula which is satisfied if and only if the clause is. We refer to these 
clauses as axioms. We say that P is a Th(A;) refutation of if 

• "P is a sequence Li , . . . , of /c-threshold formulas. 

• Each formula Lj is either an axiom or is semantically entailed by formulas Lj, Lj/ with 

< j. 

• The final formula is > 1. 

The size of V is the sum of the sizes of Li, . . . , Lf. We say that V is tree-like if the underlying 
directed acyclic graph representing the implication structure of the proof is a tree. 

We are now ready to state the connection of [BPS06] between the number-on-the-forehead 
complexity of disjointness and the size of Th(/c) proofs. 
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Theorem 26 (Beame, Pitassi, and Segerlind [BPS06]) Let k > 2 be a constant. For every n, 
there is a CNF formula (f) onn variables such that the size of any Th{k — 1) refutation ofcj) is at 
least 

-Ri/4(DISJfc^m) 



exp I Q 



logn 



where m "'^^^ 



2 log n ■ 

Substituting the bounds from Corollary 24 we obtain the following. 

Corollary 27 Let k > 2 be a constant. For every n there is a CNF formula 4> over n variables 
which requires Th{k — 1) refutation proofs of size 



exp ( il 



^2/(9fc+9) 



(logn)4/9 22V3 
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